Skip to content

gh-142659: Optimize set_swap_bodies for intersection_update#148155

Open
Siyet wants to merge 2 commits intopython:mainfrom
Siyet:gh-142659-optimize-set-replace-body
Open

gh-142659: Optimize set_swap_bodies for intersection_update#148155
Siyet wants to merge 2 commits intopython:mainfrom
Siyet:gh-142659-optimize-set-replace-body

Conversation

@Siyet
Copy link
Copy Markdown

@Siyet Siyet commented Apr 6, 2026

Summary

Replace the general-purpose set_swap_bodies() with a specialized set_replace_body() that exploits the invariant that the source argument is always a uniquely-referenced temporary set about to be discarded.

Follow-up to the observation in #132290 (comment) and #142659.

Changes

set_swap_bodies() was designed for arbitrary two-way swaps between any two sets, but it is only called from set_intersection_update() and set_intersection_update_multi_impl(), where the second argument (tmp) is always a freshly created temporary with Py_REFCNT == 1.

The new set_replace_body() exploits this invariant:

  • Skip atomic stores on src: the temporary is not visible to other threads, so plain assignments suffice (saves atomic fence overhead in the free-threaded build).
  • Skip copy_small_table for src: use plain memcpy instead of per-entry atomic stores when writing back to src's smalltable (Py_GIL_DISABLED path).
  • Remove frozenset hash swap: neither argument is ever a frozenset in these call sites (enforced via assert).
  • Simplify shared-marking logic: src is never shared (enforced via assert), so only one direction of the shared-marking check is needed — propagate shared status from dst to src for proper deallocation of old entries.
  • Remove unused Py_hash_t h variable.

All assumptions are guarded by assert() to document and enforce the contract.

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 6, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@eendebakpt
Copy link
Copy Markdown
Contributor

@Siyet Could you add a few benchmarks for the affected cases?

@Siyet
Copy link
Copy Markdown
Author

Siyet commented Apr 6, 2026

Benchmarks on a dedicated server (AMD EPYC 1 vCPU, 2 GB RAM, Ubuntu 24.04, no background load, taskset -c 0, 50 rounds):

Default build (with GIL):

Benchmark Before (ns) After (ns) Change
small (5 & 5) 1010 ± 19 1022 ± 21 +1.2%
medium (100 & 100) 5010 ± 76 4930 ± 58 -1.6%
large (10k & 10k) 1141519 ± 12612 1137019 ± 11591 -0.4%
multi-arg (100 & 100 & 50) 6042 ± 50 6048 ± 88 +0.1%
empty result (100 & 0) 4323 ± 47 4301 ± 59 -0.5%

Free-threading build (--disable-gil):

Benchmark Before (ns) After (ns) Change
small (5 & 5) 1910 ± 18 1931 ± 22 +1.1%
medium (100 & 100) 6716 ± 63 6650 ± 91 -1.0%
large (10k & 10k) 2407935 ± 22615 2404563 ± 25709 -0.1%
multi-arg (100 & 100 & 50) 8371 ± 91 8284 ± 82 -1.0%
empty result (100 & 0) 5896 ± 111 5828 ± 65 -1.1%

The end-to-end improvement is modest (~1%) because set_replace_body is a small fraction of the total intersection_update time — most time is spent in set_intersection (iteration, hashing, insertion). The main value of this change is code clarity and correctness: the new function documents its contract via assertions, removes dead branches (frozenset hash swap, bidirectional shared-marking), and skips unnecessary atomic operations on the uniquely-referenced temporary.

Benchmark script
"""Benchmark set.intersection_update for gh-142659.

Measures the performance of intersection_update across different set sizes
and usage patterns. Each benchmark recreates the set on every iteration
to measure the full intersection_update path including set_replace_body.
"""
import statistics
import sys
import timeit

ROUNDS = 50

BENCHMARKS = {
    "small (5 & 5)": {
        "setup": "",
        "stmt": "set(range(5)).intersection_update({3, 4, 5, 6, 7})",
        "number": 2_000_000,
    },
    "medium (100 & 100)": {
        "setup": "b = set(range(50, 150))",
        "stmt": "set(range(100)).intersection_update(b)",
        "number": 500_000,
    },
    "large (10k & 10k)": {
        "setup": "b = set(range(5000, 15000))",
        "stmt": "set(range(10000)).intersection_update(b)",
        "number": 2_000,
    },
    "multi-arg (100 & 100 & 50)": {
        "setup": "b = set(range(50, 150)); c = set(range(75, 125))",
        "stmt": "set(range(100)).intersection_update(b, c)",
        "number": 500_000,
    },
    "empty result (100 & 0 overlap)": {
        "setup": "b = set(range(200, 300))",
        "stmt": "set(range(100)).intersection_update(b)",
        "number": 500_000,
    },
}


def main():
    print(f"Python {sys.version}")
    print(f"Rounds: {ROUNDS}")
    print()
    print(f"{'Benchmark':<32} {'Mean (ns)':>10} {'Stdev':>10} {'Min':>10} {'Max':>10}")
    print("-" * 78)

    for name, bench in BENCHMARKS.items():
        times = timeit.repeat(
            bench["stmt"],
            setup=bench["setup"],
            number=bench["number"],
            repeat=ROUNDS,
        )
        per_iter = [t / bench["number"] * 1e9 for t in times]
        mean = statistics.mean(per_iter)
        stdev = statistics.stdev(per_iter)
        print(
            f"{name:<32} {mean:>10.1f} {stdev:>9.1f} {min(per_iter):>10.1f} {max(per_iter):>10.1f}"
        )


if __name__ == "__main__":
    main()

aleksandr.tseluyko added 2 commits April 6, 2026 16:07
Replace the general-purpose set_swap_bodies() with a specialized
set_replace_body() that exploits the invariant that src is always a
uniquely-referenced temporary about to be discarded.
@Siyet Siyet force-pushed the gh-142659-optimize-set-replace-body branch from d3f1a20 to 37a95e6 Compare April 6, 2026 13:08

assert(!PyType_IsSubtype(Py_TYPE(dst), &PyFrozenSet_Type));
assert(!PyType_IsSubtype(Py_TYPE(src), &PyFrozenSet_Type));
assert(Py_REFCNT(src) == 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(Py_REFCNT(src) == 1);
assert(_PyObject_IsUniquelyReferenced(src));

moving dst's old contents into src for proper cleanup on Py_DECREF.

t=set(a); a.clear(); a.update(b); b.clear(); b.update(t); del t
The caller guarantees that src is a uniquely-referenced temporary set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the called is required to cleanup src immediately afterwards, maybe it is cleaner to let the set_replace_body steal the reference and do the decref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants